logo

You can find all files in github.

1 Before we start

Before you start downloading data from the internet, you need to create a project in R-Studio. The name of the project, as well as its file path should NOT include the following:
1. Greek letters
2. spaces (replace the spaces with _)

For example your file path could be: C:/Environmental_Data_Analysis/Geographical_Data.

You also need to install and load all the packages that we are going to use today1.

2 Geographical data

2.1 Introduction

In general, geographical data can be divided in two main categories:

1. Vector data
2. Raster data

2.1.1 Vector data

Vector data can be further subdivided into three categories:

1. Points
2. Lines
3. Polygons

These data have discrete, well-defined borders and usually have a high level of precision, but are not necessarily accurate.

Points located within a coordinate reference system (CRS) constitude the basis of vector data. Points can either be self-standing features (e.g., Ioannina city) or they can be linked together to form more complex geometries, such as lines (e.g., the national road connecting Ioannina to Thessaloniki) and polygons (e.g., the area covered by the city of Ioannina)2.

The main packages we are going to use when dealing with vector data are the following:

1. sf
2. sp
3. rgeos
4. rgdal

You can see the sf vignettes here and the sp vignettes here.

A great resource for spatial data analysis is this book by Bivand et al. (2008), as well as this one3.

2.1.2 Raster data

On the other hand, raster data divide the surface up into cells of constant size (e.g., 1 x 1 km2 cells). Raster datasets are the basis of background images used in web-mapping and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable (Lovelace et al., 2019).

The geographic raster data model usually consists of a raster header and a matrix (with rows and columns) representing equally spaced cells (i.e., pixels). The raster header defines the coordinate reference system, the extent and the origin. The header defines the extent via the number of columns, the number of rows and the cell size resolution. Hence, starting from the origin, we can easily access and modify each single cell by either using the ID of a cell or by explicitly specifying the rows and columns. In contrast to vector data, the cell of one raster layer can only hold a single value. The value might be numeric or categorical.

Raster maps usually represent continuous phenomena, such as elevation, temperature, population density or spectral data. Discrete features such as soil or land-cover classes can also be represented as raster data. Consequently, the discrete borders of these features become blurred, and depending on the spatial task a vector representation might be more suitable.

In general, there are three raster classes:

1. RasterLayer
2. RasterBrick
3. RasterStack

The RasterLayer class represents the simplest form of a raster object, and consists of only one layer.

Both RasterBrick and RasterStack classes can handle multiple layers, but differ regarding the number of supported file formats, type of internal representation and processing speed.

A RasterBrick consists of multiple layers, which typically correspond to a single multispectral satellite file or a single multilayer object in memory.

A RasterStack is similar to a RasterBrick in the sense that it consists also of multiple layers. However, in contrast to RasterBrick, RasterStack allows you to connect several raster objects stored in different files or multiple objects in memory. More specifically, a RasterStack is a list of RasterLayer objects with the same extent and resolution (Lovelace et al., 2019).

3 Download vector data

3.2 Polygon data

3.2.1 Country boundaries

Then we can download some vector data in the form of polygons. We can use the online database GADM to download country boundaries.

We can easily plot the data we have just downloaded.

It is equally easy to subset those data.

## class       : SpatialPolygonsDataFrame 
## features    : 326 
## extent      : 19.37236, 29.6457, 34.80069, 41.74801  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 16
## names       : GID_0, NAME_0,   GID_1,                      NAME_1,                                                                                        NL_NAME_1,     GID_2,         NAME_2,                        NL_NAME_2,       GID_3,   NAME_3,          VARNAME_3,                                                     NL_NAME_3, TYPE_3,    ENGTYPE_3, CC_3, ... 
## min values  :   GRC, Greece, GRC.1_1,                      Aegean,                                                                 <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1_1,          Athos, <U+0386><U+03B8><U+03C9><U+03C2>, GRC.1.1.1_1,   Abdera, Abelokipi-Menemeni,                              <U+0386><U+03B8><U+03C9><U+03C2>,  Dímos, Municipality,   NA, ... 
## max values  :   GRC, Greece, GRC.8_1, Thessaly and Central Greece, Tessa<U+03BB><U+03AF>a <U+03BA>a<U+03B9> Ste<U+03C1>e<U+03AC> <U+0395><U+03BB><U+03BB><U+03AC>da, GRC.8.2_1, West Macedonia,           Tessa<U+03BB><U+03AF>a, GRC.8.2.9_1, Zografou,           Zografos, Tessa<U+03BB><U+03BF><U+03BD><U+03AF><U+03BA><U+03B7><U+03C2>,  Dímos, Municipality,   NA, ...

We see that our object has a Coordinate Reference System (crs) and that it includes several columns (as indicated by the slot names).

Next, we are going to see how many columns our object includes:

##  [1] "GID_0"     "NAME_0"    "GID_1"     "NAME_1"    "NL_NAME_1" "GID_2"    
##  [7] "NAME_2"    "NL_NAME_2" "GID_3"     "NAME_3"    "VARNAME_3" "NL_NAME_3"
## [13] "TYPE_3"    "ENGTYPE_3" "CC_3"      "HASC_3"

Finally, we are going to create a new spatial object by subsetting Greece. We do so, by using one of the columns present in Greece.

Let’s plot Crete now

Let’s load some data for Italy now, this time using the sf R package.

Please note that you will have to change the file path.

3.2.2 Protected areas

We can also download data from the World Database on Protected areas, the most comprehensive global dataset of protected areas. It is used to monitor the performance of existing protected areas and pinpoint priority areas for establishing new protected areas. We will use the wdpar package to download data for Greece.

If you need more information on what these columns mean, please refer to the WPDA manual.

We need to change the CRS for plotting. Currently the CRS of gr_raw_pa_data is set to Greek Grid (else known as EGSA 87). We need to change it to WGS 84 (also known as World Geodetic System). We will use the function st_transform from the sf package. The number 4326 refers to WGS 84 in the European Data Petroleum System or EPSG.

As a next step, we will clip the terrestrial protected areas to the Greek coastline. First, in order to be on the safe side, we will perform some sanity checks on Greek boundaries and then we will do the clipping.

##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>% 
  st_set_precision(1000) %>%
  st_make_valid() %>%
  st_set_precision(1000) %>%
  st_combine() %>%
  st_union() %>%
  st_set_precision(1000) %>%
  st_make_valid() %>%
  st_transform(st_crs(gr_pa_data)) %>%
  st_make_valid()
##===========================================================================##


##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  filter(MARINE == "terrestrial") %>%
  st_intersection(gr_boundary_data) %>%
  rbind(gr_pa_data %>%
          filter(MARINE == "marine") %>%
          st_difference(gr_boundary_data)) %>%
  rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
                                              "marine"))) 
##===========================================================================##

Now, we can recalculate the extent of each protected area.

As a next step, we can also calculate the percentage of land inside its protected area system that are managed under different categories, as defined by IUCN).

##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
  as.data.frame() %>%
  dplyr::select(-geometry) %>%
  group_by(IUCN_CAT) %>%
  summarize(area_km = sum(AREA_KM2)) %>%
  ungroup() %>%
  mutate(percentage = (area_km / sum(area_km)) * 100) %>%
  arrange(desc(area_km))
##===========================================================================##


##===========================================================================##
## Print it
##===========================================================================##
statistic

Now we can keep ony those PAs that have been assigned to each of the IUCN categories (I-VI) and discard all other PAs.

Let’s visualise the results.

Now we can restrict the analysis to Crete. We will clip the PA data to the extent of Crete. In order to do so, we will use the st_intersection function from the sf package and then we will visualise the results.

Finally, if we want to see all the PAs in Crete, we will do the following:

3.3 Point data

As we have mentioned earlier, the most basic category of vector data are points. In ecology, points most often represent sampling locations or field observation in general. These observations consist primarily species presence in one site. Nowadays, it is fairly easy to obtain species occurrence data through freely available, online databases, such as GBIF, BIEN and iNaturalist. Bear in mind these databases refer to current/recent species occurrences, yet there are others, such as the Paleobiology and Neotoma databases, that contain information regarding fossil data.

First, let’s load the data we have cleaned for Italy.

In order for this to run smoothly, load the rds data from Github.

3.4 Subset point data based on polygons

We can now see if there are any points occurring in a specific polygon (e.g., in any given Italian province). We will use spatial subsetting for this. Spatial subsetting is the process of selecting features of a spatial object based on whether or not they in some way relate in space to another object.
But before we do this spatial subsetting, we must load an excel file that contains only the valid binomials and not taxa above the species level and below the subspecies level (e.g., families, genera, varieties)4.

## ============================================================================##
## First load the excel file with the correct species names
## ============================================================================##
nms <- readxl::read_excel("italian_names.xlsx")
## ============================================================================##


## ============================================================================##
## Keep only one instance per taxon from our occurrence data
## ============================================================================##
nms_flags <- data_it %>% distinct(scientificName) %>% mutate(Taxon = nms$scientificName, 
    action = nms$action)
## ============================================================================##


## ============================================================================##
## Then filter out any incorrect taxa
## ============================================================================##
data_it <- data_it %>% full_join(nms_flags) %>% as_tibble() %>% filter(!str_detect(action, 
    "DEL"))
## ============================================================================##


## ============================================================================##
## Then convert the occurrence data from GBIF to a spatial object
## ============================================================================##
data_it_sf <- st_as_sf(data_it, coords = c("decimalLongitude", "decimalLatitude"), 
    crs = 4326)
## ============================================================================##


## ============================================================================##
## Then create an object representing Pescara and another one representing Padova
## ============================================================================##
Pescara <- Italy %>% filter(NOME_PRO == "PESCARA")
Padova <- Italy %>% filter(NOME_PRO == "PADOVA")
## ============================================================================##

## ============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
## ============================================================================##
occs_Pescara <- data_it_sf[Pescara, ]
occs_Padova <- data_it_sf[Padova, ]
## ============================================================================##


## ============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
## ============================================================================##
occs_not_in_Padova <- data_it_sf[Padova, , op = st_disjoint]
## ============================================================================##

Note the empty argument denoted with , , in the preceding code chunk is included to highlight op, the third argument in [ for sf objects. By using st_disjoint, we selected all the occurrences that are not present in the province of Padova.

Let’s see what we have created.

We can also calculate how many occurrences and how many species each Italian province contains.

4 Download raster data

Nowadays, there are several freely available raster databases for climate (e.g., WorldClim, CHELSA, Oscillayers, PaleoClim, CliMond, MerraClim, TerraClimate, EuMedClim, Envirem), soil (SoilGrids), topographical (EarthEnv, CGIAR, MERIT), geological (e.g., GLIM) and LULC (Land Cover/Land Use - Luh) data. Of course there are many other databases dealing with NVDI data, habitat heterogeneity, human population density etc.

Today we will deal with climate data from the WorldClim database, as these data can be accessed rather easily through the R interface. WorldClim provides data for 19 environmental variables that have to with temperature and precipitation:

WorldClim environmental variables

Var. Description
BIO1 Annual Mean Temperature
BIO2 Mean Diurnal Range
BIO3 Isothermality (BIO2/BIO7) (* 100)
BIO4 Temperature Seasonality (standard deviation *100)
BIO5 Max Temperature of Warmest Month
BIO6 Min Temperature of Coldest Month
BIO7 Temperature Annual Range (BIO5-BIO6)
BIO8 Mean Temperature of Wettest Quarter
BIO9 Mean Temperature of Driest Quarter
BIO10 Mean Temperature of Warmest Quarter
BIO11 Mean Temperature of Coldest Quarter
BIO12 Annual Precipitation
BIO13 Precipitation of Wettest Month
BIO14 Precipitation of Driest Month
BIO15 Precipitation Seasonality (Coefficient of Variation)
BIO16 Precipitation of Wettest Quarter
BIO17 Precipitation of Driest Quarter
BIO18 Precipitation of Warmest Quarter
BIO19 Precipitation of Coldest Quarter

We can download future and past climate data from WorldClim as well, but we need to specify the Global Circulation Model, the Representative Pathway Scenario and the time period (2050 or 2070) we are interested in. Depending on our area of interest, we should choose carefully which GCMs are better suited for our intended analysis (McSweeney et al., 2015).

Please bear in mind that the following commands may take a while, since we will be downloading ca. 124 MB.

Maybe it would be a good idea NOT to run the following commands now, if your internet connection is not very fast (it may take more than 10 mins) - You can change the res = 2.5 to res = 10 if you want to speed it up, but bear in mind that the resolution will be very coarse.

Let’s now change the name of the variables in each object we created.

We need to correct the scale of bio_1 (i.e., Mean annual temperature). You can see here the reason why.

Next we will download elevation data for Italy.

4.1 Crop raster data

We can crop the raster data we have just downloaded to the extent of Italy and then to the extent of some of its provinces. After we crop climate data to the extent of Italy, we can merge them with the altitudinal data. We will be able to do so, because then and only then will both objects (climate and altitude) have the same extent.

You can do the same for the future climate data if you want.

Let’s now plot our results.

4.2 Change the resolution

If for some reason, we want to change the resolution of our rasters, we can use the aggregate function5:

4.3 Calculate some statistics

The package raster contains functions for extracting descriptive statistics for entire rasters. Printing a raster object to the console by typing its name returns minimum and maximum values of a raster. The function summary() provides common descriptive statistics (minimum, maximum, quartiles and number of NAs). Further summary operations such as the standard deviation or custom summary statistics can be calculated with the function cellStats().

Raster value statistics can be visualized in a variety of ways. Specific functions such as boxplot(), density(), hist() and pairs() work also with raster objects

##                bio1      bio10       bio11      bio12      bio13       bio14
## Min.       -5.50000     9.0000  -112.00000   290.0000    44.0000     0.00000
## 1st Qu.    11.10000   193.0000    25.00000   676.0000    88.0000    20.00000
## Median     13.00000   215.0000    45.00000   813.5000   102.0000    36.00000
##               bio15      bio16      bio17      bio18      bio19       bio2
## Min.        7.00000   117.0000     6.0000    22.0000    83.0000    42.0000
## 1st Qu.    22.00000   237.0000    80.0000    90.0000   168.0000    72.0000
## Median     28.00000   275.0000   131.0000   151.0000   196.0000    82.0000
##                bio3     bio4       bio5         bio6       bio7       bio8
## Min.       21.00000  4524.00    42.0000  -145.000000   171.0000  -104.0000
## 1st Qu.    29.00000  5698.00   253.0000   -17.000000   235.0000    99.0000
## Median     31.00000  6110.00   278.0000     8.000000   254.0000   121.0000
##               bio9 ITA_msk_alt
## Min.     -103.0000     -6.3125
## 1st Qu.    46.0000    110.6849
## Median    201.0000    337.8559
##  [ reached getOption("max.print") -- omitted 4 rows ]

You can do the same for other variables as well and you can use the function plot3D() to visualize the altitudinal data.

4.4 Correlation among raster data

As you may know, if we want to analyse some ecological data, we should not include in our analysis collinear independent variables (IVs). The correlation coefficient of our IVs should not exceed \(> 0.7\). We will now check the collinearity of our IVs and see which of them do not present any collinearity issues.

## 11 variables from the 20 input variables have collinearity problem: 
##  
## bio6 bio10 bio17 bio16 bio11 bio5 bio7 bio1 bio18 bio12 bio14 
## 
## After excluding the collinear variables, the linear correlation coefficients ranges between: 
## min correlation ( bio9 ~ bio2 ):  -0.04715004 
## max correlation ( bio4 ~ bio15 ):  -0.6800702 
## 
## ---------- VIFs of the remained variables -------- 
##     Variables       VIF
## 1       bio13  4.048064
## 2       bio15  3.052235
## 3       bio19  4.497012
## 4        bio2 42.962093
## 5        bio3 41.237207
## 6        bio4 32.253340
## 7        bio8  4.109454
## 8        bio9  5.322188
## 9 ITA_msk_alt  6.976100

4.5 Raster extraction

Raster extraction is the process of identifying and returning the values associated with a target raster at specific locations, based on a (typically vector) geographic selector object. The results depend on the type of selector used (points, lines or polygons) and arguments passed to the raster::extract() function.

The simplest example is extracting the value of a raster cell at specific points. For this purpose, we will use the point data we subsetted earlier, occs_Padova and occs_Pescara. We will then compare if the species that are both present in these two spatial subsets have significantly different environmental preferences.

Please note that the exact_extract function from the exactextractr R package is way faster than the extract function from the raster R package, if we want to extract data based on polygons.

## ============================================================================##
## Then we convert the sp objects to sf objects
## ============================================================================##
occs_Padova_sf <- occs_Padova %>% st_as_sf()
occs_Pescara_sf <- occs_Pescara %>% st_as_sf()
## ============================================================================##


## ============================================================================##
## Extract abiotic data
## ============================================================================##
Padova_data <- extract(current_climate_crop, occs_Padova_sf, sp = T, df = T) %>% 
    as.data.frame()

Pescara_data <- extract(current_climate_crop, occs_Pescara_sf, sp = T, df = T) %>% 
    as.data.frame()
## ============================================================================##


## ============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
## ============================================================================##
Padova_data_mean <- Padova_data %>% group_by(Taxon) %>% summarise(bio1_mean = mean(bio1), 
    bio2_mean = mean(bio2))

Pescara_data_mean <- Pescara_data %>% group_by(Taxon) %>% summarise(bio1_mean = mean(bio1), 
    bio2_mean = mean(bio2))
## ============================================================================##


## ============================================================================##
## We make sure that we keep ONLY those species that are both present in Padova
## and Pescara
## ============================================================================##
Padova_subset <- setDT(Padova_data_mean)[Taxon %chin% Pescara_data_mean$Taxon]
Pescara_subset <- setDT(Pescara_data_mean)[Taxon %chin% Padova_data_mean$Taxon]
Padova_subset$Taxon <- factor(Padova_subset$Taxon)
Pescara_subset$Taxon <- factor(Pescara_subset$Taxon)
## ============================================================================##


## ============================================================================##
## Check if they are identical
## ============================================================================##
identical(sort(unique(Padova_subset$Taxon)), sort(unique(Pescara_subset$Taxon)))
## [1] TRUE
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Province      1   29.8  29.848   26.51 3.54e-07 ***
## Residuals   610  686.8   1.126                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Province      1   4023    4023   303.6 <2e-16 ***
## Residuals   610   8084      13                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 40 observations deleted due to missingness

Did we find any significant differences?

5 Appendix: R code

##===========================================================================##
## Do NOT run this code
##===========================================================================##
library(knitr)
library(rmdformats)

## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
                 cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               # comment=NA,
               message=FALSE,
               warning=FALSE,
               eval = TRUE)
opts_knit$set(width=75)
##===========================================================================##
pre {
  max-height: 400px;
  overflow-y: auto;
}

pre[class] {
  max-height: 400px;
}
##===========================================================================##
## Load them
##===========================================================================##
library(tidyverse)
library(rgbif)
library(raster)
library(usdm)
library(rgeos)
library(rgdal)
library(rinat)
library(wdpar)
library(countrycode)
library(devtools)
library(sf)
library(dismo)
library(CoordinateCleaner)
library(scales)
library(scrubr)
library(mapr)
library(rasterVis)
library(data.table)
##===========================================================================##
##===========================================================================##
## Download the data for Greece
##===========================================================================##
## There are three levels available, you are free to explore them
Greece <-  raster::getData('GADM', country = 'GRC', level = 3)
##===========================================================================##
##===========================================================================##
## Plot Greece
##===========================================================================##
plot(Greece)
##===========================================================================##
## First we need to see what is inside the object we created
##===========================================================================##
Greece
names(Greece)
Crete <- subset(Greece, NAME_1 == "Crete")
##===========================================================================##
## Plot Crete
##===========================================================================##
plot(Crete)

##===========================================================================##
## Load the Italian provinces
##===========================================================================##
Italy <- read_sf('Italin provinces.shp') %>% 
  st_transform(4326)

##===========================================================================##
## Download protected area data for Greece (it might take a while)
##===========================================================================##
gr_raw_pa_data <- wdpa_fetch("Greece", wait = TRUE)
##===========================================================================##

##===========================================================================##
## Let's see what we have downloaded
##===========================================================================##
# head(gr_raw_pa_data)
##===========================================================================##
## Reproject data to longitude/latitude for plotting
##===========================================================================##
gr_pa_data <- st_transform(gr_raw_pa_data, 4326)
htmltools::img(src = knitr::image_uri(file.path("G:/Academic/Lessons/EMB/Labs/Labs/TRUE RMD Files/HBS Seminar/Logo_HBS/ebe-LOGO-005-300_1_70.jpg")), 
               alt = 'logo', 
               style = 'position:absolute; top:0; right:0; padding:5px;')
##===========================================================================##
## Recalculate extent
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  mutate(AREA_KM2 = as.numeric(st_area(.)) * 1e-6) ## What is this number?
##===========================================================================##
##===========================================================================##
## Calculate the percentage of land inside the PA system for each IUCN category
##===========================================================================##
statistic <- gr_pa_data %>%
  as.data.frame() %>%
  dplyr::select(-geometry) %>%
  group_by(IUCN_CAT) %>%
  summarize(area_km = sum(AREA_KM2)) %>%
  ungroup() %>%
  mutate(percentage = (area_km / sum(area_km)) * 100) %>%
  arrange(desc(area_km))
##===========================================================================##


##===========================================================================##
## Print it
##===========================================================================##
statistic
##===========================================================================##
## Keep certain PAs only
##===========================================================================##
gr_pa_data_filtered <- gr_pa_data %>% filter(!str_detect(IUCN_CAT, 'Not'))
##===========================================================================##
##===========================================================================##
## First convert Greece and Crete to sf objects
##===========================================================================##
Greece_sf <- st_as_sf(Greece)
Crete_sf <- st_as_sf(Crete)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Greece_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Greece_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = gr_pa_data_filtered, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete <- st_intersection(gr_pa_data_filtered, Crete_sf)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Crete_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = pa_crete, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")
##===========================================================================##
## Restrict the analysis to Crete
##===========================================================================##
pa_crete_all <- st_intersection(gr_pa_data, Crete_sf)
##===========================================================================##


##===========================================================================##
## Then do the plotting
##===========================================================================##
ggplot(Crete_sf) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf() +
  geom_sf(data = Crete_sf, fill = NA) +
  geom_sf(aes(fill = IUCN_CAT), data = pa_crete_all, inherit.aes = FALSE) +
  theme(axis.title = element_blank(), legend.position = "bottom")

data_it <- readRDS('Italian data cleaned.rds')
##============================================================================##
## First load the excel file with the correct species names
##============================================================================##
nms <- readxl::read_excel('italian_names.xlsx')
##============================================================================##


##============================================================================##
## Keep only one instance per taxon from our occurrence data
##============================================================================##
nms_flags <- data_it %>% 
  distinct(scientificName) %>% 
  mutate(Taxon = nms$scientificName,
         action = nms$action)
##============================================================================##


##============================================================================##
## Then filter out any incorrect taxa
##============================================================================##
data_it <- data_it %>% 
  full_join(nms_flags) %>% 
  as_tibble() %>% 
  filter(!str_detect(action, 'DEL'))
##============================================================================##


##============================================================================##
## Then convert the occurrence data from GBIF to a spatial object
##============================================================================##
data_it_sf <- st_as_sf(data_it, 
                    coords = c('decimalLongitude', 
                               'decimalLatitude'), 
                               crs = 4326)
##============================================================================##


##============================================================================##
## Then create an object representing Pescara and another one representing
## Padova
##============================================================================##
Pescara <- Italy %>% filter(NOME_PRO == 'PESCARA')
Padova <- Italy %>% filter(NOME_PRO == 'PADOVA')
##============================================================================##

##============================================================================##
## Then use it for spatial subsetting to return all the occurrences in Sfakia and
## in Rethymno
##============================================================================##
occs_Pescara <- data_it_sf[Pescara, ]
occs_Padova <- data_it_sf[Padova, ]
##============================================================================##


##============================================================================##
## Use it again to obtain the occurrences NOT in Sfakia
##============================================================================##
occs_not_in_Padova <- data_it_sf[Padova, , op = st_disjoint]
##============================================================================##
##============================================================================##
## First we plot Italy
##============================================================================##
plot(st_geometry(Italy))
##============================================================================##


##============================================================================##
## Then we plot Padova, after we convert the polygon into a polyline with st_cast
##============================================================================##
plot(st_geometry(st_cast(Padova, 'MULTILINESTRING')), add = T, col = 'red')
##============================================================================##


##============================================================================##
## Then we plot the occurrences present in Padova
##============================================================================##
plot(st_geometry(occs_Padova %>% dplyr::sample_n(100)), add = T, col = 'blue')
##============================================================================##


##============================================================================##
## And finally we plot the occurrences not present in Padova
##============================================================================##
plot(st_geometry(occs_not_in_Padova %>% dplyr::sample_n(1000)), add = T, col = 'darkgreen')
aggr_data <- readRDS('Italian species data.rds')
##============================================================================##
## First we will create a new spatial object containing both the point and the
## polygon data
##============================================================================##
aggr_data <- Italy %>%
  st_join(data_it_sf) %>%
  dplyr::group_by(NOME_PRO) %>% ## Here we group the data based on the provincial names
  
  ## First we create a variable containing the sum of the species in each province
  dplyr::summarize(Species_Number = sum(NROW(unique(Taxon))),
            
            ## Then we just sum the number of occurrences in each province
            Occurrence_Number = n())
##============================================================================##


##============================================================================##
## Save it to disk
##============================================================================##
saveRDS(aggr_data, 'Italian species data.rds')
##============================================================================##
## Let's first plot the occurence number in each Italian province
##============================================================================##
plot(aggr_data['Occurrence_Number'])


##============================================================================##
## And then plot the species number in each Italian province
##============================================================================##
plot(aggr_data['Species_Number'])
##============================================================================##
## First, let's download current climate data for the whole planet
##============================================================================##
current_climate <- raster::getData('worldclim', var = 'bio', res = 2.5)
##============================================================================##


##============================================================================##
## Then, let's download future climate data
##============================================================================##
future_climate <- raster::getData('CMIP5', var = 'bio', res = 2.5, 
                          rcp = 85, model = 'CC', year = 70)

##============================================================================##
names(current_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5', 
                               'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10', 
                               'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15', 
                               'bio_16', 'bio_17', 'bio_18', 'bio_19')

names(future_climate) <- c('bio_1', 'bio_2', 'bio_3', 'bio_4', 'bio_5', 
                               'bio_6', 'bio_7', 'bio_8', 'bio_9', 'bio_10', 
                               'bio_11', 'bio_12', 'bio_13', 'bio_14', 'bio_15', 
                               'bio_16', 'bio_17', 'bio_18', 'bio_19')
gain(current_climate$bio_1) = 0.1

gain(future_climate$bio_1) = 0.1
altitude <- raster::getData('alt', country = 'ITA', mask = TRUE) 
current_climate_crop <- readRDS('Italy WC.rds')
altitude <- raster('ITA_msk_alt.grd')
##============================================================================##
## First for Italy
##============================================================================##
current_climate_crop <- crop(current_climate, 
                            Italy, 
                            snap = 'in') %>% 
  mask(., Italy)

altitude <- crop(altitude, 
                 Italy, 
                 snap = 'in') %>% 
  mask(., Italy) %>% 
  
  ## We are forcing the extent of altitude to be exactly the same as that of the
  ## climate data
  resample(., current_climate_crop, 'bilinear')
##============================================================================##


##============================================================================##
## Merge (stack) climate and altitude
##============================================================================##
current_climate_crop <- stack(current_climate_crop, altitude)
##============================================================================##


##============================================================================##
## Then for Padova
##============================================================================##
current_climate_Padova <- crop(current_climate_crop,
                              Padova,
                              snap = 'in') %>% 
  mask(., Padova)
##============================================================================##
plot(current_climate_crop[[1]])
title('Mean annual temperature in Italy')
##===========================================================================##
## Sanity checks
##===========================================================================##
gr_boundary_data <- Greece %>% st_as_sf() %>% 
  st_set_precision(1000) %>%
  st_make_valid() %>%
  st_set_precision(1000) %>%
  st_combine() %>%
  st_union() %>%
  st_set_precision(1000) %>%
  st_make_valid() %>%
  st_transform(st_crs(gr_pa_data)) %>%
  st_make_valid()
##===========================================================================##


##===========================================================================##
## clip Greece's protected areas to the coastline
##===========================================================================##
gr_pa_data <- gr_pa_data %>%
  filter(MARINE == "terrestrial") %>%
  st_intersection(gr_boundary_data) %>%
  rbind(gr_pa_data %>%
          filter(MARINE == "marine") %>%
          st_difference(gr_boundary_data)) %>%
  rbind(gr_pa_data %>% filter(!MARINE %in% c("terrestrial",
                                              "marine"))) 
##===========================================================================##
##============================================================================##
## First, some main summary statistics
##============================================================================##
cellStats(current_climate_crop, 'summary')


##============================================================================##
## Then some density plots for some variables
##============================================================================##
density(current_climate_crop[[1:2]])


##============================================================================##
## And some boxplots for the same variables
##============================================================================##
boxplot(current_climate_crop[[1:2]])

##============================================================================##
## We can also calculate the correlation of those variables
##============================================================================##
pairs(current_climate_crop[[1:2]])
vifcor(current_climate_crop %>% as.data.frame(), th = 0.7)
##============================================================================##
## Then we convert the sp objects to sf objects
##============================================================================##
occs_Padova_sf <- occs_Padova %>% st_as_sf()
occs_Pescara_sf <- occs_Pescara %>% st_as_sf()
##============================================================================##


##============================================================================##
## Extract abiotic data
##============================================================================##
Padova_data <- extract(current_climate_crop, occs_Padova_sf, sp = T, df = T) %>% 
  as.data.frame()

Pescara_data <- extract(current_climate_crop, occs_Pescara_sf, sp = T, df = T) %>% 
  as.data.frame()
##============================================================================##


##============================================================================##
## We create a dataframe containing the mean values for each taxon for the annual
## temperature and precipitation
##============================================================================##
Padova_data_mean <- Padova_data %>% 
  group_by(Taxon) %>% 
  summarise(bio1_mean = mean(bio1),
            bio2_mean = mean(bio2))

Pescara_data_mean <- Pescara_data %>% 
  group_by(Taxon) %>% 
  summarise(bio1_mean = mean(bio1),
            bio2_mean = mean(bio2))
##============================================================================##


##============================================================================##
## We make sure that we keep ONLY those species that are both present in Padova
## and Pescara
##============================================================================##
Padova_subset <- setDT(Padova_data_mean)[Taxon %chin% Pescara_data_mean$Taxon]
Pescara_subset <- setDT(Pescara_data_mean)[Taxon %chin% Padova_data_mean$Taxon]
Padova_subset$Taxon <- factor(Padova_subset$Taxon)
Pescara_subset$Taxon <- factor(Pescara_subset$Taxon)
##============================================================================##


##============================================================================##
## Check if they are identical
##============================================================================##
identical(sort(unique(Padova_subset$Taxon)), 
          sort(unique(Pescara_subset$Taxon)))


##============================================================================##
## Convert them to tibbles
##============================================================================##
Padova_subset_tbl <- Padova_subset %>% 
  arrange(Taxon) %>%
  as_tibble() %>% 
  mutate(Province = 'Padova')

Pescara_subset_tbl <- Pescara_subset %>%
  arrange(Taxon) %>% 
  as_tibble() %>% 
  mutate(Province = 'Pescara')

Padova_Pescara <- full_join(Padova_subset_tbl, Pescara_subset_tbl)
Padova_Pescara$Province <- factor(Padova_Pescara$Province)
##============================================================================##


##============================================================================##
## Finally, we conduct the ANOVA
##============================================================================##
summary(aov(bio1_mean ~ Province, data = Padova_Pescara))
summary(aov(bio2_mean ~ Province, data = Padova_Pescara))
##===========================================================================##
## Do NOT run this code
##===========================================================================##
labs = knitr::all_labels(echo = TRUE)
##===========================================================================##

References

Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., & Pebesma, E.J. (2008) Applied spatial data analysis with r. Springer,

Lovelace, R., Nowosad, J., & Muenchow, J. (2019) Geocomputation with r. CRC Press,

McSweeney, C.F., Jones, R.G., Lee, R.W., & Rowell, D.P. (2015) Selecting CMIP5 GCMs for downscaling over multiple regions. Clim. Dyn., 44, 3237–3260.


  1. There are two ways to do that fast:
    1. library(pacman); p_load('package_name_1', 'package_name_2')
    2. install.packages(c('package_name_1', 'package_name_2'), dependecies = TRUE)
    Remember that if you use the first way, you need to install package pacman first.

  2. If you want to deepen your understanding on this, please click here

  3. You are strongly advised to have a look to both these books

  4. This step has already been conducted for you prior to this practical.

  5. We can also use the disaggregate function if we want to have a finer resolution, but this is strongly discouraged

 

Delivered to you by Dr. Kostas Kougioumoutzis

kkougiou@aua.gr